[REPORT] Multicloud and on-premises data transfers at scale with AWS DataSync #AWSreInvent #STG353

AWS re:Invent 2023

#AWS DataSync

yusuke

2023.12.08

この記事は公開されてから1年以上経過しています。情報が古い可能性がありますので、ご注意ください。

I participated in the Builders' Session for AWS DataSync. In this post, I will briefly introduce this session.

Overview

Join this builders’ session to immerse yourself in the world of multi-cloud and on-premises data transfers. Learn how to configure and perform a data transfer from an on-premises NFS server and a publicly accessible Google Cloud Storage bucket that is hosting a public dataset to Amazon S3. AWS DataSync makes it fast and simple to migrate your data from other clouds or on-premises NFS servers to AWS as part of your business workflow. Walk away with a step-by-step guide on how to scale out DataSync tasks using multiple DataSync agents. You must bring your laptop to participate.

REPORT

Agenda

Single DataSync task and agent

Google Cloud Storage to Amazon S3

On premises to Amazon S3

Multiple agents for a single task

Multiple agents per task

Maximize bandwidth and copy large datasets with multiple tasks

Multiple tasks scale out agents

workshop

The environment was prepared in advance of the workshop by CloudFormation, I started by allowing the HTTP 80 port from MyIP to the DataSync agent security group, which is required for DataSync agent activation.

Activate DataSync agents

DataSync > Agents > Create agent

Two agents were created, but I did not have time to run them using two.

Data transfer to AWS from Google Cloud Storage

In this case, we will transfer data from Google Cloud Storage to Amazon S3. We will use a single DataSync agent to start the DataSync task and observe the task metrics.

Check the Google Cloud Storage bucket

Transfer these files.

Create DataSync task

DataSync > AgenTasksts > Create task

Configure source location

Source location options: Create a new location
Location type: Object storage
Agents: Agent-1
Server: storage.googleapis.com
Bucket name: gcp-public-data-arco-era5
Folder: /co/single-level-reanalysis.zarr/
Authentication Requires credentials is unchecked

Configure destination location

Destination location options: Create a new location
Location type: Amazon S3
S3 bucket: datasync-s3-workshop
S3 storage class: Standard
Folder: gcp-to-s3-with-single-agent/
IAM role: Click Autogenerate button

Configure settings

Task Name: gcp-to-s3-with-single-agent
Verify data: Verify only the data transferred
Set bandwidth limit: Use available

Data transfer configuration as follows.

From Specific files and folders, set Add Pattern to copy files beginning with a specific folder and specific file name.

/stl1/10*
/stl2/10*
/stl3/10*
/stl4/10*

Copy object tags: OFF

In Logging, click Autogenerate to create a CloudWatch resource policy that allows CloudWatch log groups and DataSync to write to CloudWatch.

Check the contents and create a task with Create task.

Execute the DataSync task

When the task status becomes 'Available,' click on 'Start,' and then click on the 'Start with defaults' option.

Once the task has been executed, we can check its progress in History.

We can see that the data throughput was approximately 202 MB/second. Additionally, the file transfer took about 6 minutes, and it was copied at a rate of 209 files/second.

To check if it has been transferred to the S3 bucket

We found that the data was transferred as configured.

Conclusion

The Builders Session is a 60-minute workshop where you can easily experience AWS services. So, when I attend re:Invent, I always choose services that I don't usually work with or ones I want to catch up on. The DataSync session was many repeat sessions, and it seemed like there was a high interest from people who wanted to learn about migration services for implementing migrations. Additionally, using AWS DataSync allowed us to experience data transfer in just a few steps.